week 6: integers and other monsters

ordered categories

Ordered categorical outcomes

Ordered categories are variables that are discrete, like a count, except that the values merely indicate different ordered levels.1 Most Likert scale questions are ordered categories, even though we pretend they’re continuous (see polychoric correlations).

If we want to model an outcome as an ordered categorical variable, what we’re really asking is how does moving along an associated predictor variable move predictions in the outcome progressively through the categories in sequence. We must therefore model our outcomes in the right order.

To do so, we’ll use the principles of the generalized linear model discussed last week. That is, we use a link function – the CUMULATIVE LINK FUNCTION – to model the probability of a value that is that value or smaller.

example: the trolley problem

data(Trolley, package = "rethinking")
trolley <- Trolley
glimpse(trolley)
Rows: 9,930
Columns: 12
$ case      <fct> cfaqu, cfbur, cfrub, cibox, cibur, cispe, fkaqu, fkboa, fkbo…
$ response  <int> 4, 3, 4, 3, 3, 3, 5, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 3, …
$ order     <int> 2, 31, 16, 32, 4, 9, 29, 12, 23, 22, 27, 19, 14, 3, 18, 15, …
$ id        <fct> 96;434, 96;434, 96;434, 96;434, 96;434, 96;434, 96;434, 96;4…
$ age       <int> 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, …
$ male      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ edu       <fct> Middle School, Middle School, Middle School, Middle School, …
$ action    <int> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, …
$ intention <int> 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, …
$ contact   <int> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ story     <fct> aqu, bur, rub, box, bur, spe, aqu, boa, box, bur, car, spe, …
$ action2   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, …
trolley %>% 
  ggplot(aes( x=response )) +
  geom_histogram()
Code
trolley %>%
  count(response) %>% 
  mutate(pr_k     = n/nrow(trolley),
         cum_pr_k = cumsum(pr_k)) %>% 
  ggplot(aes(x = response, y = cum_pr_k)) +
  geom_point(size = 3, color = "#1c5253") +
  geom_line(color = "#1c5253", alpha = .3) + 
  labs(x = "response",
       y = "cumulative probability")
Code
# primary data
trolley_plot <-
  trolley %>%
  count(response) %>%
  mutate(pr_k     = n / nrow(trolley),
         cum_pr_k = cumsum(n / nrow(trolley))) %>% 
  mutate(discrete_probability = ifelse(response == 1, cum_pr_k, cum_pr_k - pr_k))

# annotation
text <-
  tibble(text     = 1:7,
         response = seq(from = 1.25, to = 7.25, by = 1),
         cum_pr_k = trolley_plot$cum_pr_k - .065)

trolley_plot %>% 
  ggplot(aes(x = response, y = cum_pr_k,
             color = cum_pr_k, fill = cum_pr_k)) +
  geom_line() +
  geom_point(shape = 21, colour = "grey92", 
             size = 2.5, stroke = 1) +
  geom_linerange(aes(ymin = 0, ymax = cum_pr_k),
                 alpha = 1/2) +
  geom_linerange(aes(x = response + .025,
                     ymin = ifelse(response == 1, 0, discrete_probability), 
                     ymax = cum_pr_k),
                 color = "black") +
  # number annotation
  geom_text(data = text, 
            aes(label = text),
            size = 4) +
  scale_x_continuous(breaks = 1:7) +
  scale_y_continuous("cumulative proportion", breaks = c(0, .5, 1), 
                     limits = c(0, 1.1)) +
  theme(axis.ticks = element_blank(),
        axis.title.y = element_text(angle = 90),
        legend.position = "none")

On to the modeling. We’ll start with an intercept-only model.

\[\begin{align*} R_i &\sim \text{Categorical}(p) \\ \text{logit}(p_k) = \alpha_k - \phi \phi_i &= 0 \\ \alpha_k &\sim \text{Normal}(0, 1.5) \end{align*}\]

Remember, \(\phi\) doesn’t have an intercept because there’s a \(\alpha\) for each cut point.

m1 <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1, 
  prior = c( prior(normal(0, 1.5), class = Intercept) ),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/data/generated_data/m61.1")
)
summary(m1)
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -1.92      0.03    -1.98    -1.86 1.00     3004     3065
Intercept[2]    -1.27      0.02    -1.31    -1.22 1.00     4085     3449
Intercept[3]    -0.72      0.02    -0.76    -0.68 1.00     4532     3499
Intercept[4]     0.25      0.02     0.21     0.29 1.00     4697     3614
Intercept[5]     0.89      0.02     0.85     0.93 1.00     4370     3119
Intercept[6]     1.77      0.03     1.71     1.82 1.00     5091     3618

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
m1 %>% 
  fixef() %>% 
  inv_logit_scaled() %>% 
  round(digits = 3)
             Estimate Est.Error  Q2.5 Q97.5
Intercept[1]    0.128     0.508 0.122 0.135
Intercept[2]    0.220     0.506 0.212 0.228
Intercept[3]    0.328     0.505 0.318 0.337
Intercept[4]    0.562     0.505 0.552 0.571
Intercept[5]    0.709     0.506 0.700 0.718
Intercept[6]    0.854     0.507 0.847 0.861

Note: the standard errors (Est.Error) are not valid for this approach. (Just look at how they compare to the quantiles of the distribution.) Drawing from the posterior is the best solution to this.

as_draws_df(m1) %>% 
  mutate_at(vars(starts_with("b_")), inv_logit_scaled) %>% 
  pivot_longer(starts_with("b_")) %>% 
  group_by(name) %>% 
  summarise(mean = mean(value),
            sd   = sd(value),
            ll   = quantile(value, probs = .025),
            ul   = quantile(value, probs = .975))
# A tibble: 6 × 5
  name            mean      sd    ll    ul
  <chr>          <dbl>   <dbl> <dbl> <dbl>
1 b_Intercept[1] 0.128 0.00338 0.122 0.135
2 b_Intercept[2] 0.220 0.00411 0.212 0.228
3 b_Intercept[3] 0.328 0.00468 0.318 0.337
4 b_Intercept[4] 0.562 0.00499 0.552 0.571
5 b_Intercept[5] 0.709 0.00456 0.700 0.718
6 b_Intercept[6] 0.854 0.00351 0.847 0.861

exercise

Create a plot showing the posterior predictive distribution for the intercept-only model (m1). Can you visualize both the posterior and the observed data?

solution

Code
predicted_draws(m1, newdata = data.frame(1)) %>% 
  count(.prediction) %>%
  mutate(pr_k = n / sum(n)) %>% 
  ggplot(aes(x = .prediction, y = pr_k)) +
  geom_col(fill = "#1c5253", alpha = 0.7) +
  geom_point(data = trolley_plot, aes(x = response), 
             color = "#3d405b", size = 2) +
  labs(x = "response",
       y = "probability",
       title = "Posterior Predictive Distribution") 

adding predictors

Now let’s add the features of the story – contact, action, and intention – as predictors in our model. Let’s expand our mathematical model. Here’s our original:

\[\begin{align*} R_i &\sim \text{Categorical}(p) \\ \text{logit}(p_k) = \alpha_k - \phi \phi_i &= 0 \\ \alpha_k &\sim \text{Normal}(0, 1.5) \end{align*}\]

McElreath proposes the following causal formula:

\[ \phi_i = \beta_1\text{action}_i + \beta_2\text{contact}_i + (\beta_3 + \beta_4\text{action}_i + \beta_5\text{contact}_i)\text{intention}_i \]

Code
m2 <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1 + action + contact + intention + action:intention + contact:intention, 
  prior = c( prior(normal(0, 1.5), class = Intercept),
             prior(normal(0, 0.5), class = b)),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/data/generated_data/m61.2")
)

m2
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + action + contact + intention + action:intention + contact:intention 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]         -2.64      0.05    -2.74    -2.53 1.00     2973     2727
Intercept[2]         -1.94      0.05    -2.04    -1.85 1.00     3113     2791
Intercept[3]         -1.35      0.05    -1.44    -1.26 1.00     2979     3029
Intercept[4]         -0.31      0.04    -0.40    -0.23 1.00     2785     2912
Intercept[5]          0.36      0.04     0.27     0.44 1.00     2886     3000
Intercept[6]          1.26      0.05     1.17     1.35 1.00     3129     2816
action               -0.48      0.05    -0.58    -0.37 1.00     3072     2837
contact              -0.35      0.07    -0.48    -0.21 1.00     3031     2820
intention            -0.30      0.06    -0.41    -0.18 1.00     2694     2861
action:intention     -0.43      0.08    -0.59    -0.27 1.00     2949     3105
contact:intention    -1.23      0.10    -1.42    -1.05 1.00     2710     2794

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
as_draws_df(m2) %>% 
  select(starts_with("b_")) %>% 
  select(-matches("[0-9]")) %>% 
  pivot_longer(everything()) %>% 
  mutate(
    name = str_remove(name, "b_"),
    name = str_replace(name, ":", " x ")
  ) %>% 
  ggplot( aes(x = value, y = name) ) +
  geom_vline( aes(xintercept = 0), linetype = "dashed", alpha = .3 ) +
  stat_dist_halfeye(fill = "#5e8485") +
  scale_x_continuous("marginal posterior", breaks = -5:0 / 4) +
  scale_y_discrete(NULL) +
  coord_cartesian(xlim = c(-1.4, 0))
nd <- 
  trolley %>% 
  distinct(action, contact, intention) %>% 
  mutate(combination = str_c(action, contact, intention, sep = "_"))

f <- predicted_draws( m2 , newdata = nd )

# what have we done?
f %>% str()
gropd_df [24,000 × 9] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ action     : int [1:24000] 0 0 0 0 0 0 0 0 0 0 ...
 $ contact    : int [1:24000] 1 1 1 1 1 1 1 1 1 1 ...
 $ intention  : int [1:24000] 0 0 0 0 0 0 0 0 0 0 ...
 $ combination: chr [1:24000] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
 $ .row       : int [1:24000] 1 1 1 1 1 1 1 1 1 1 ...
 $ .chain     : int [1:24000] NA NA NA NA NA NA NA NA NA NA ...
 $ .iteration : int [1:24000] NA NA NA NA NA NA NA NA NA NA ...
 $ .draw      : int [1:24000] 1 2 3 4 5 6 7 8 9 10 ...
 $ .prediction: Factor w/ 7 levels "1","2","3","4",..: 2 7 3 6 5 4 7 2 7 3 ...
 - attr(*, "groups")= tibble [6 × 6] (S3: tbl_df/tbl/data.frame)
  ..$ action     : int [1:6] 0 0 0 0 1 1
  ..$ contact    : int [1:6] 0 0 1 1 0 0
  ..$ intention  : int [1:6] 0 1 0 1 0 1
  ..$ combination: chr [1:6] "0_0_0" "0_0_1" "0_1_0" "0_1_1" ...
  ..$ .row       : int [1:6] 4 6 1 2 3 5
  ..$ .rows      : list<int> [1:6] 
  .. ..$ : int [1:4000] 12001 12002 12003 12004 12005 12006 12007 12008 12009 12010 ...
  .. ..$ : int [1:4000] 20001 20002 20003 20004 20005 20006 20007 20008 20009 20010 ...
  .. ..$ : int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ : int [1:4000] 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 ...
  .. ..$ : int [1:4000] 8001 8002 8003 8004 8005 8006 8007 8008 8009 8010 ...
  .. ..$ : int [1:4000] 16001 16002 16003 16004 16005 16006 16007 16008 16009 16010 ...
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE
Code
f %>% 
  as.data.frame() %>% 
  mutate(
    across( c(action, contact, intention) , 
            as.character),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, fill=intention) )+
  geom_bar( stat="identity", position="dodge" ) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count" )

exercise

Create a new model that includes only the main effects of action, contact, and intention (no interactions). Then:

  1. Compare the coefficients between this model and m2 (the model with interactions)
  2. Compare the fits of these models using PSIS and WAIC.

solution

Code
# Fit simpler model
m2_simple <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1 + action + contact + intention, 
  prior = c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 0.5), class = b)),
  iter = 2000, warmup = 1000, cores = 4, chains = 4,
  file=here("files/data/generated_data/m61.2simple")
)

# Compare coefficients
bind_rows(
  fixef(m2) %>% as_tibble(rownames = "term"),
  fixef(m2_simple) %>% as_tibble(rownames = "term")
) %>% 
  mutate(model = rep(c("with interactions", "main effects only"), each = n()/2)) %>% 
  ggplot(aes(x = term, y = Estimate, color = model)) +
  geom_point() +
  geom_errorbar(aes(ymin = Q2.5, ymax = Q97.5), width = 0.2) +
  coord_flip() +
  theme(legend.position = "top")

Code
# Plot predictions
predicted_draws(m2_simple, newdata = nd) %>% 
  as.data.frame() %>% 
  mutate(
    across(c(action, contact, intention), as.character),
    facet_title = str_c("Action: ", action, 
                        "\nContact: ", contact)
  ) %>% 
  count(facet_title, intention, .prediction) %>% 
  ggplot(aes(x = .prediction, y = n, fill = intention)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~facet_title) +
  labs(x = "response", y = "count")

m2  <-       add_criterion(m2,        "loo")
m2_simple <- add_criterion(m2_simple, "loo")

loo_compare(m2, m2_simple, criterion = "loo") %>% 
  print(simplify = F)
          elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
m2             0.0       0.0 -18464.7     40.4        11.1      0.1  36929.4
m2_simple    -80.4      12.4 -18545.1     38.0         9.1      0.0  37090.2
          se_looic
m2            80.8
m2_simple     75.9
m2  <-       add_criterion(m2,        "waic")
m2_simple <- add_criterion(m2_simple, "waic")

loo_compare(m2, m2_simple, criterion = "waic") %>% 
  print(simplify = F)
          elpd_diff se_diff  elpd_waic se_elpd_waic p_waic   se_p_waic waic    
m2             0.0       0.0 -18464.7      40.4         11.0      0.1   36929.4
m2_simple    -80.4      12.4 -18545.1      38.0          9.1      0.0   37090.2
          se_waic 
m2            80.8
m2_simple     75.9

Ordered categorical predictors

distinct(trolley, edu)
                   edu
1        Middle School
2    Bachelor's Degree
3         Some College
4      Master's Degree
5 High School Graduate
6      Graduate Degree
7     Some High School
8    Elementary School
trolley <-
  trolley %>% 
  mutate(edu_new = 
           recode(edu,
                  "Elementary School" = 1,
                  "Middle School" = 2,
                  "Some High School" = 3,
                  "High School Graduate" = 4,
                  "Some College" = 5, 
                  "Bachelor's Degree" = 6,
                  "Master's Degree" = 7,
                  "Graduate Degree" = 8) %>% 
           as.integer())

To incorporate this variable as a MONOTONIC CATEOGORICAL PREDICTOR, we’ll set up a notation such that each step up in value comes with its own incremental or marginal effect on the outcome.

\[ \phi_i = \sum_{j=1}^7 \delta_j \] We only need 7 because the first category is absorbed into the intercept for \(\phi\). So our full formula for this parameter is: \[ \phi_i = \beta_1\text{action}_i + \beta_2\text{contact}_i + \beta_3\text{intention}_i + \beta_E\sum_{j=0}^{E_i-1} \delta_j \] where \(\beta_E\) is the average effect

\[\begin{align*} \text{response}_i &\sim \text{Categorical}(\mathbf{p}) \\ \text{logit}(p_k) &= \alpha_k - \phi_i \\ \phi_i &= \beta_1\text{action}_i + \beta_2\text{contact}_i + \beta_3\text{intention}_i + \beta_4 \text{ mo(edu\_new}_i, \boldsymbol{\delta}) \\ \alpha_k &\sim \text{Normal}(0, 1.5) \\ \beta_1, \ldots, \beta_3 &\sim \text{Normal}(0, 1) \\ \beta_4 &\sim \text{Normal}(0, 0.143) \\ \boldsymbol{\delta} &\sim \text{Dirichlet}(2, 2, 2, 2, 2, 2, 2), \end{align*}\]

m3 <- 
  brm(data = trolley, 
      family = cumulative,
      response ~ 1 + action + contact + intention + mo(edu_new),  # note the `mo()` syntax
      prior = c(prior(normal(0, 1.5), class = Intercept),
                prior(normal(0, 1), class = b),
                # note the new kinds of prior statements
                prior(normal(0, 0.143), class = b, coef = moedu_new),
                prior(dirichlet(2, 2, 2, 2, 2, 2, 2), class = simo, coef = moedu_new1)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4, 
      seed = 12,
      file = here("files/data/generated_data/m61.3"))
m3
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + action + contact + intention + mo(edu_new) 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -3.13      0.17    -3.53    -2.84 1.00     1707     2012
Intercept[2]    -2.45      0.17    -2.83    -2.16 1.00     1702     1934
Intercept[3]    -1.87      0.17    -2.25    -1.58 1.00     1708     1804
Intercept[4]    -0.85      0.17    -1.23    -0.56 1.00     1731     1956
Intercept[5]    -0.18      0.17    -0.56     0.11 1.00     1718     1925
Intercept[6]     0.73      0.17     0.35     1.01 1.00     1737     1988
action          -0.71      0.04    -0.78    -0.63 1.00     4604     3182
contact         -0.96      0.05    -1.06    -0.86 1.00     4521     2761
intention       -0.72      0.04    -0.79    -0.65 1.00     4514     2992
moedu_new       -0.05      0.03    -0.11    -0.00 1.00     1705     1866

Monotonic Simplex Parameters:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moedu_new1[1]     0.26      0.15     0.04     0.59 1.00     2292     3104
moedu_new1[2]     0.14      0.09     0.02     0.35 1.00     4405     2501
moedu_new1[3]     0.19      0.11     0.03     0.44 1.00     4005     2613
moedu_new1[4]     0.16      0.09     0.03     0.39 1.00     3636     2564
moedu_new1[5]     0.04      0.05     0.00     0.15 1.00     2606     1894
moedu_new1[6]     0.09      0.06     0.01     0.25 1.00     3747     2484
moedu_new1[7]     0.12      0.07     0.02     0.29 1.00     3829     2596

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s see the effect of education after controlling for story parameters.

Code
nd <- distinct(
  trolley, 
  action, contact, intention, edu_new
)
predicted_draws(m3, nd) %>% 
  ungroup() %>% 
  count(edu_new, .prediction) %>% 
  with_groups(edu_new, mutate, total=sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction),
         edu_new = as.factor(edu_new)) %>% 
  ggplot(aes(x = .prediction,  y = pk, color = edu_new)) +
  geom_point() +
  geom_line() +
  labs( x="response", 
        y="probability" ) +
  theme(legend.position = "top")

exercise

Create a new model that includes education as a regular categorical predictor (not monotonic) and compare it to m3. Your tasks:

  1. Fit the model using the original edu variable (not edu_new)
  2. Create a plot comparing the predicted probabilities between the two models
  3. Use create a side-by-side visual comparison of the education effects
  4. Interpret whether the monotonic constraint appears to improve the model fit

solution

Code
# Fit model with regular categorical education
m3_cat <- brm(
  data = trolley,
  family = cumulative,
  response ~ 1 + action + contact + intention + edu,
  prior = c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 1), class = b)),
  iter = 2000, warmup = 1000, cores = 4, chains = 4,
  file=here("files/data/generated_data/m61.3cat")
)

m3_cat
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + action + contact + intention + edu 
   Data: trolley (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept[1]             -2.96      0.06    -3.06    -2.85 1.00     2652
Intercept[2]             -2.28      0.05    -2.37    -2.18 1.00     2864
Intercept[3]             -1.69      0.05    -1.78    -1.60 1.00     2906
Intercept[4]             -0.66      0.04    -0.75    -0.57 1.00     2938
Intercept[5]              0.01      0.04    -0.07     0.10 1.00     3128
Intercept[6]              0.92      0.05     0.83     1.02 1.00     3422
action                   -0.71      0.04    -0.79    -0.63 1.00     3688
contact                  -0.96      0.05    -1.06    -0.86 1.00     4052
intention                -0.72      0.04    -0.79    -0.65 1.00     4802
eduElementarySchool       0.90      0.21     0.49     1.30 1.00     5182
eduGraduateDegree        -0.18      0.06    -0.30    -0.05 1.00     3974
eduHighSchoolGraduate    -0.06      0.07    -0.19     0.07 1.00     4515
eduMastersDegree         -0.11      0.06    -0.22    -0.00 1.00     4173
eduMiddleSchool          -0.26      0.15    -0.54     0.03 1.00     5264
eduSomeCollege           -0.31      0.05    -0.40    -0.22 1.00     4140
eduSomeHighSchool         0.13      0.09    -0.05     0.31 1.00     5128
                      Tail_ESS
Intercept[1]              2505
Intercept[2]              2724
Intercept[3]              2652
Intercept[4]              3064
Intercept[5]              3014
Intercept[6]              3240
action                    2941
contact                   2989
intention                 3222
eduElementarySchool       2776
eduGraduateDegree         2930
eduHighSchoolGraduate     3103
eduMastersDegree          3194
eduMiddleSchool           3190
eduSomeCollege            3030
eduSomeHighSchool         3042

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

solution

Code
# Create predictions for both models
p1 <- predicted_draws(m3, nd) %>% 
  ungroup() %>% 
  count(edu_new, .prediction) %>% 
  with_groups(edu_new, mutate, total = sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction),
         edu_new = as.factor(edu_new)) %>% 
  ggplot(aes(x = .prediction, y = pk, color = edu_new)) +
  geom_point() +
  geom_line() +
  labs(x = "response", y = "probability", title = "Monotonic Model") +
  theme(legend.position = "top")

p2 <- predicted_draws(m3_cat, newdata = distinct(trolley, action, contact, intention, edu)) %>% 
  ungroup() %>% 
  count(edu, .prediction) %>% 
  with_groups(edu, mutate, total = sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction)) %>% 
  ggplot(aes(x = .prediction, y = pk, color = edu)) +
  geom_point() +
  geom_line() +
  labs(x = "response", y = "probability", title = "Categorical Model") +
  theme(legend.position = "top")

# Combine plots
p1 + p2